PM 566 Assignment 2

Author

Dana Gonzalez

Data Wrangling

Load and Merge Datasets

individual <- read.csv("/Users/danagonzalez/Downloads/chs_individual.csv")
regional <- read.csv("/Users/danagonzalez/Downloads/chs_regional.csv")

combined <- merge(individual, regional, by = "townname", all = FALSE)

nrow(combined)
[1] 1200
summary(combined)
   townname              sid              male            race          
 Length:1200        Min.   :   1.0   Min.   :0.0000   Length:1200       
 Class :character   1st Qu.: 528.8   1st Qu.:0.0000   Class :character  
 Mode  :character   Median :1041.5   Median :0.0000   Mode  :character  
                    Mean   :1037.5   Mean   :0.4917                     
                    3rd Qu.:1554.2   3rd Qu.:1.0000                     
                    Max.   :2053.0   Max.   :1.0000                     
                                                                        
    hispanic          agepft           height        weight      
 Min.   :0.0000   Min.   : 8.961   Min.   :114   Min.   : 42.00  
 1st Qu.:0.0000   1st Qu.: 9.610   1st Qu.:135   1st Qu.: 65.00  
 Median :0.0000   Median : 9.906   Median :139   Median : 74.00  
 Mean   :0.4342   Mean   : 9.924   Mean   :139   Mean   : 79.33  
 3rd Qu.:1.0000   3rd Qu.:10.177   3rd Qu.:143   3rd Qu.: 89.00  
 Max.   :1.0000   Max.   :12.731   Max.   :165   Max.   :207.00  
                  NA's   :89       NA's   :89    NA's   :89      
      bmi            asthma       active_asthma  father_asthma    
 Min.   :11.30   Min.   :0.0000   Min.   :0.00   Min.   :0.00000  
 1st Qu.:15.78   1st Qu.:0.0000   1st Qu.:0.00   1st Qu.:0.00000  
 Median :17.48   Median :0.0000   Median :0.00   Median :0.00000  
 Mean   :18.50   Mean   :0.1463   Mean   :0.19   Mean   :0.08318  
 3rd Qu.:20.35   3rd Qu.:0.0000   3rd Qu.:0.00   3rd Qu.:0.00000  
 Max.   :41.27   Max.   :1.0000   Max.   :1.00   Max.   :1.00000  
 NA's   :89      NA's   :31                      NA's   :106      
 mother_asthma        wheeze          hayfever         allergy      
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
 Mean   :0.1023   Mean   :0.3313   Mean   :0.1747   Mean   :0.2929  
 3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
 NA's   :56       NA's   :71       NA's   :118      NA's   :63      
  educ_parent        smoke             pets           gasstove     
 Min.   :1.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:2.000   1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:1.0000  
 Median :3.000   Median :0.0000   Median :1.0000   Median :1.0000  
 Mean   :2.797   Mean   :0.1638   Mean   :0.7667   Mean   :0.7815  
 3rd Qu.:3.000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
 Max.   :5.000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
 NA's   :64      NA's   :40                        NA's   :33      
      fev              fvc            mmef          pm25_mass     
 Min.   : 984.8   Min.   : 895   Min.   : 757.6   Min.   : 5.960  
 1st Qu.:1809.0   1st Qu.:2041   1st Qu.:1994.0   1st Qu.: 7.615  
 Median :2022.7   Median :2293   Median :2401.5   Median :10.545  
 Mean   :2031.3   Mean   :2324   Mean   :2398.8   Mean   :14.362  
 3rd Qu.:2249.7   3rd Qu.:2573   3rd Qu.:2793.8   3rd Qu.:20.988  
 Max.   :3323.7   Max.   :3698   Max.   :4935.9   Max.   :29.970  
 NA's   :95       NA's   :97     NA's   :106                      
    pm25_so4        pm25_no3         pm25_nh4         pm25_oc      
 Min.   :0.790   Min.   : 0.730   Min.   :0.4100   Min.   : 1.450  
 1st Qu.:1.077   1st Qu.: 1.538   1st Qu.:0.7375   1st Qu.: 2.520  
 Median :1.815   Median : 2.525   Median :1.1350   Median : 4.035  
 Mean   :1.876   Mean   : 4.488   Mean   :1.7642   Mean   : 4.551  
 3rd Qu.:2.605   3rd Qu.: 7.338   3rd Qu.:2.7725   3rd Qu.: 5.350  
 Max.   :3.230   Max.   :12.200   Max.   :4.2500   Max.   :11.830  
                                                                   
    pm25_ec          pm25_om          pm10_oc          pm10_ec      
 Min.   :0.1300   Min.   : 1.740   Min.   : 1.860   Min.   :0.1400  
 1st Qu.:0.4000   1st Qu.: 3.020   1st Qu.: 3.228   1st Qu.:0.4100  
 Median :0.5850   Median : 4.840   Median : 5.170   Median :0.5950  
 Mean   :0.7358   Mean   : 5.460   Mean   : 5.832   Mean   :0.7525  
 3rd Qu.:1.1750   3rd Qu.: 6.418   3rd Qu.: 6.855   3rd Qu.:1.1975  
 Max.   :1.3600   Max.   :14.200   Max.   :15.160   Max.   :1.3900  
                                                                    
    pm10_tc           formic          acetic           hcl        
 Min.   : 1.990   Min.   :0.340   Min.   :0.750   Min.   :0.2200  
 1st Qu.: 3.705   1st Qu.:0.720   1st Qu.:2.297   1st Qu.:0.3250  
 Median : 6.505   Median :1.105   Median :2.910   Median :0.4350  
 Mean   : 6.784   Mean   :1.332   Mean   :3.010   Mean   :0.4208  
 3rd Qu.: 8.430   3rd Qu.:1.765   3rd Qu.:4.000   3rd Qu.:0.4625  
 Max.   :16.440   Max.   :2.770   Max.   :5.140   Max.   :0.7300  
                                                                  
      hno3           o3_max          o3106           o3_24      
 Min.   :0.430   Min.   :38.27   Min.   :28.22   Min.   :18.22  
 1st Qu.:1.593   1st Qu.:49.93   1st Qu.:41.90   1st Qu.:23.31  
 Median :2.455   Median :64.05   Median :46.74   Median :27.59  
 Mean   :2.367   Mean   :60.16   Mean   :47.76   Mean   :30.23  
 3rd Qu.:3.355   3rd Qu.:67.69   3rd Qu.:55.24   3rd Qu.:32.39  
 Max.   :4.070   Max.   :84.44   Max.   :67.01   Max.   :57.76  
                                                                
      no2             pm10          no_24hr         pm2_5_fr    
 Min.   : 4.60   Min.   :18.40   Min.   : 2.05   Min.   : 9.01  
 1st Qu.:12.12   1st Qu.:20.71   1st Qu.: 4.74   1st Qu.:10.28  
 Median :16.40   Median :29.64   Median :12.68   Median :22.23  
 Mean   :18.99   Mean   :32.64   Mean   :16.21   Mean   :19.79  
 3rd Qu.:23.24   3rd Qu.:39.16   3rd Qu.:26.90   3rd Qu.:27.73  
 Max.   :37.97   Max.   :70.39   Max.   :42.95   Max.   :31.55  
                                 NA's   :100     NA's   :300    
     iacid           oacid        total_acids          lon        
 Min.   :0.760   Min.   :1.090   Min.   : 1.520   Min.   :-120.7  
 1st Qu.:1.835   1st Qu.:2.978   1st Qu.: 4.930   1st Qu.:-118.8  
 Median :2.825   Median :4.135   Median : 6.370   Median :-117.7  
 Mean   :2.788   Mean   :4.342   Mean   : 6.708   Mean   :-118.3  
 3rd Qu.:3.817   3rd Qu.:5.982   3rd Qu.: 9.395   3rd Qu.:-117.4  
 Max.   :4.620   Max.   :7.400   Max.   :11.430   Max.   :-116.8  
                                                                  
      lat       
 Min.   :32.84  
 1st Qu.:33.93  
 Median :34.10  
 Mean   :34.20  
 3rd Qu.:34.65  
 Max.   :35.49  
                

Impute Data

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
get_mode <- function(v) {
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]
}

modes <- sapply(combined, get_mode)

modes
      townname            sid           male           race       hispanic 
      "Alpine"          "841"            "0"            "W"            "0" 
        agepft         height         weight            bmi         asthma 
            NA             NA             NA             NA            "0" 
 active_asthma  father_asthma  mother_asthma         wheeze       hayfever 
           "0"            "0"            "0"            "0"            "0" 
       allergy    educ_parent          smoke           pets       gasstove 
           "0"            "3"            "0"            "1"            "1" 
           fev            fvc           mmef      pm25_mass       pm25_so4 
            NA             NA             NA         "8.74"         "1.73" 
      pm25_no3       pm25_nh4        pm25_oc        pm25_ec        pm25_om 
        "1.59"         "0.88"         "2.54"          "0.4"         "3.04" 
       pm10_oc        pm10_ec        pm10_tc         formic         acetic 
        "3.25"         "0.41"         "3.75"         "1.03"         "2.49" 
           hcl           hno3         o3_max          o3106          o3_24 
        "0.46"         "1.98"        "65.82"        "55.05"        "41.23" 
           no2           pm10        no_24hr       pm2_5_fr          iacid 
       "12.18"        "24.73"         "2.48"             NA         "2.39" 
         oacid    total_acids            lon            lat 
        "3.52"          "5.5" "-116.7664109"   "32.8350521" 
combined$agepft[is.na(combined$agepft)] <- 9.924
combined$height[is.na(combined$height)] <- 139
combined$weight[is.na(combined$weight)] <- 79.33
combined$bmi[is.na(combined$bmi)] <- 18.5
combined$asthma[is.na(combined$asthma)] <- 0
combined$father_asthma[is.na(combined$father_asthma)] <- 0
combined$mother_asthma[is.na(combined$mother_asthma)] <- 0
combined$wheeze[is.na(combined$wheeze)] <- 0
combined$hayfever[is.na(combined$hayfever)] <- 0
combined$allergy[is.na(combined$allergy)] <- 0
combined$educ_parent[is.na(combined$educ_parent)] <- 3
combined$smoke[is.na(combined$smoke)] <- 0
combined$gasstove[is.na(combined$gasstove)] <- 1
combined$fev[is.na(combined$fev)] <- 2031.3
combined$fvc[is.na(combined$fvc)] <- 2324
combined$mmef[is.na(combined$mmef)] <-2398.8
combined$no_24hr[is.na(combined$no_24hr)] <-2.48
combined$pm2_5_fr[is.na(combined$pm2_5_fr)] <- 19.79

Create New Obesity Variable and Summary Table

combined <- combined %>%
  mutate(obesity_level = case_when(
    bmi < 14 ~ "Underweight",
    bmi >= 14 & bmi < 22 ~ "Normal",
    bmi >= 22 & bmi < 24 ~ "Overweight",
    bmi >= 24 ~ "Obese"))

obesity_summary <- combined %>%
  group_by(obesity_level) %>%
  summarise(
    min_bmi = min(bmi, na.rm = TRUE),
    max_bmi = max(bmi, na.rm = TRUE),
    observations = n())
obesity_summary
# A tibble: 4 × 4
  obesity_level min_bmi max_bmi observations
  <chr>           <dbl>   <dbl>        <int>
1 Normal           14.0    22.0          975
2 Obese            24.0    41.3          103
3 Overweight       22.0    24.0           87
4 Underweight      11.3    14.0           35

Create New Exposure Variable and Summary Table

combined <- combined %>%
  mutate(smoke_gas_exposure = case_when(
    smoke == "1" & gasstove == "1" ~ "Both",
    smoke == "1" & gasstove == "0" ~ "Second Hand Smoke Only",
    smoke == "0" & gasstove == "1" ~ "Gas Stove Only",
    smoke == "0" & gasstove == "0" ~ "Neither"))

smoke_summary <- combined %>%
  group_by(smoke_gas_exposure) %>%
  summarise(
    observations = n())
smoke_summary
# A tibble: 4 × 2
  smoke_gas_exposure     observations
  <chr>                         <int>
1 Both                            154
2 Gas Stove Only                  791
3 Neither                         219
4 Second Hand Smoke Only           36

Create Additional Summary Tables

summary_town <- combined %>%
  group_by(townname) %>%
  summarise(
    average_fev = mean(fev, na.rm = TRUE),
    sd_fev = sd(fev, na.rm = TRUE),
    .groups = "drop")
summary_sex <- combined %>%
  group_by(male) %>%
  summarise(
    average_fev = mean(fev, na.rm = TRUE),
    sd_fev = sd(fev, na.rm = TRUE),
    .groups = "drop")
summary_obesity <- combined %>%
  group_by(obesity_level) %>%
  summarise(
    average_fev = mean(fev, na.rm = TRUE),
    sd_fev = sd(fev, na.rm = TRUE),
    .groups = "drop")
summary_smoke_gas <- combined %>%
  group_by(smoke_gas_exposure) %>%
  summarise(
    average_fev = mean(fev, na.rm = TRUE),
    sd_fev = sd(fev, na.rm = TRUE),
    .groups = "drop")

summary_town
# A tibble: 12 × 3
   townname      average_fev sd_fev
   <chr>               <dbl>  <dbl>
 1 Alpine              2086.   291.
 2 Atascadero          2077.   324.
 3 Lake Elsinore       2039.   303.
 4 Lake Gregory        2085.   319.
 5 Lancaster           2006.   316.
 6 Lompoc              2038.   350.
 7 Long Beach          1987.   319.
 8 Mira Loma           1988.   325.
 9 Riverside           1990.   277.
10 San Dimas           2028.   319.
11 Santa Maria         2024.   311.
12 Upland              2028.   342.
summary_sex
# A tibble: 2 × 3
   male average_fev sd_fev
  <int>       <dbl>  <dbl>
1     0       1966.   313.
2     1       2099.   308.
summary_obesity
# A tibble: 4 × 3
  obesity_level average_fev sd_fev
  <chr>               <dbl>  <dbl>
1 Normal              2001.   294.
2 Obese               2267.   325.
3 Overweight          2224.   317.
4 Underweight         1697.   301.
summary_smoke_gas
# A tibble: 4 × 3
  smoke_gas_exposure     average_fev sd_fev
  <chr>                        <dbl>  <dbl>
1 Both                         2026.   300.
2 Gas Stove Only               2024.   319.
3 Neither                      2059.   328.
4 Second Hand Smoke Only       2057.   293.

Exploratory Data Analysis

Association between BMI and Forced Expiratory Volume (FEV)

library(ggplot2)

ggplot(data = combined, mapping = aes(x = bmi, y = fev)) + 
  geom_point() +
  geom_smooth(method = "loess", col = "pink", se = FALSE) +
  labs(title = "Scatterplot of BMI vs Forced Expiratory Volume (mL/sec)", 
       x = "BMI", 
       y = "FEV (mL)")
`geom_smooth()` using formula = 'y ~ x'

Based on this preliminary visualization, there seems to be a positive association between BMI and FEV. This relationship is maintained until a BMI level of about 30, where the relationship becomes slightly negative.

Association between Smoke and Gas Exposure and Forced Expiratory Volume (FEV)

labels_data <- combined %>%
  group_by(smoke_gas_exposure) %>%
  summarise(mean_fev = mean(fev, na.rm = TRUE))

combined |>
  ggplot(mapping = aes(x = smoke_gas_exposure, y = fev, fill = smoke_gas_exposure)) + 
  geom_boxplot() +
  scale_fill_brewer(palette = "RdPu") +
  labs(title = "Forced Expiratory Volume (mL/sec) by Smoke and Gas Exposure",
       x = "Smoke and Gas Exposure", 
       y = "Forced Expiratory Volume (mL/sec)") +
  geom_text(data = labels_data, aes(x = smoke_gas_exposure, y = mean_fev, label = round(mean_fev, 1)),
            vjust = -0.75, color = "black", size = 3) +
  theme_minimal()

Based on this preliminary visualization, there do not seem to be significant differences in FEV across smoke and gas exposure categories, although further analysis is required.

Association between PM2.5 Exposure and Forced Expiratory Volume (FEV)

combined <- combined %>%
  mutate(pm25_exposure = pm25_so4 +pm25_no3 + pm25_nh4 + pm25_oc + pm25_ec + pm25_om)
ggplot(data = combined, mapping = aes(x = pm25_exposure, y = fev)) + 
  geom_point() +
  geom_smooth(method = "loess", col = "pink", se = FALSE) +
  labs(title = "Scatterplot of PM2.5 Exposure vs Forced Expiratory Volume (mL/sec)", 
       x = "PM2.5 Exposure", 
       y = "FEV (mL/sec)")
`geom_smooth()` using formula = 'y ~ x'

Based on this preliminary visualization, there seems to be a slightly negative (although weak) relationship between PM2.5 exposure and forced expiratory volume (FEV).

Data Visualization

Scatterplots of BMI vs FEV by Town

combined[!is.na(combined$townname), ] |> 
  ggplot() + 
  geom_point(mapping = aes(x = bmi, y = fev)) + 
  facet_wrap(~ townname, scales = "free") +
  geom_smooth(mapping = aes(x = bmi, y = fev), method = "loess", col = "pink", se = FALSE) +
  labs(title = "Scatterplots of BMI vs Forced Expiratory Volume (FEV) by Town", 
       x = "BMI", 
       y = "FEV (mL/sec)")
`geom_smooth()` using formula = 'y ~ x'

Although the associations between BMI and FEV vary between towns, most seem to be positive and strong in nature (although further analysis is required to fully determine this.) Further, some towns (like Altascadero) have a more linear relationship between the two variables of interest relative to other towns (like Alpine, Lake Gregory, and Mira Loma). It’s also important to note that some towns (like Lompoc, Mira Loma, and San Dimas) have regression lines that turn negative (downward) with higher BMI values, although further analysis is required to investigate the possible cause of this.

Stacked histograms of FEV by BMI category

ggplot(combined, aes(x = fev, fill = factor(obesity_level))) +
  geom_histogram(position = "stack", bins = 25) +
  labs(title = "Stacked Histogram of FEV by BMI Category",
       x = "FEV (mL/sec)",
       y = "Count") +
  scale_fill_brewer(palette = "RdPu") +
  theme_minimal()

Based on this stacked histogram, we can immediately see that the majority of observations for the BMI variable fall under the “Normal” level, with far less for “Obese”, “Overweight”, and “Underweight” (in descending order). Too, most observations for the “Normal” category are concentrated around an FEV value of 2000, with counts of observations tapering off in either direction from this peak (unimodal, normal distribution). Although the distributions for the remaining three categories are also unimodal (with the exception of “Overweight”), their respective peaks are shifted (“Obese” = 2250, “Underweight” = 1650).

Stacked histograms of FEV by Smoke and Gas Exposure Category

ggplot(combined, aes(x = fev, fill = factor(smoke_gas_exposure))) +
  geom_histogram(position = "stack", bins = 25) +
  labs(title = "Stacked Histogram of FEV by Smoke and Gas Exposure Category",
       x = "FEV (mL/sec)",
       y = "Count") +
  scale_fill_brewer(palette = "RdPu") +
  theme_minimal()

Unlike the previous stacked histogram, the majority of observations come from two categories: “Both” and “Gas Stove Only”. Too, the distributions for all four categories seem to me unimodal and normally distributed, with respective peaks concentrated around an FEV value of 2000.

Barchart of BMI by Smoke and Gas Exposure.

ggplot(data = combined, aes(x = cut(bmi, breaks = 15), fill = factor(smoke_gas_exposure))) + 
  geom_bar(position = "stack") +
  scale_fill_brewer(palette = "RdPu") +
  labs(title = "Stacked Bar Chart of BMI by Smoke and Gas Exposure",
       x = "BMI",
       y = "Count") +
  theme_minimal()

This stacked bar chart shows that the majority of BMI data falls under the “Both” and “Gas Stove Only” categories. Too, while this distribution is unimodal (centered at BMI values around 15-19), the entire distribution is skewed to the right (smaller BMI values).

Boxplot (Statistical Summary Graph) of FEV by Obesity Level

labels_data2 <- combined %>%
  group_by(obesity_level) %>%
  summarise(mean_fev = mean(fev, na.rm = TRUE))

ggplot(data = combined, aes(x = obesity_level, y = fev, fill = obesity_level)) + 
  geom_boxplot() +
  labs(title = "Boxplot of Forced Expiratory Volume by Obesity Level",
       x = "Obesity Level",
       y = "FEV (mL/sec)") +
  scale_fill_brewer(palette = "RdPu") +
  geom_text(data = labels_data2, aes(x = obesity_level, y = mean_fev, label = round(mean_fev, 1)),
            vjust = -0.75, color = "black", size = 3) +
  theme_minimal()

Comparing the boxplots of FEV across obesity levels we can see distinct differences in the median values for each BMI category. While the medians for “Obese” and “Overweight” are relatively close (also the two highest medians across categories), the median for the “Normal” group is slightly less. The median FEV value for the “Underweight” group is the lowest of the four groups (around 300 units less than the “Normal” median, and around 450 units less than for the remaining two categories).

Boxplot (Statistical Summary Graph) of FEV by Smoke and Gas Exposure Category

labels_data <- combined %>%
  group_by(smoke_gas_exposure) %>%
  summarise(mean_fev = mean(fev, na.rm = TRUE))

ggplot(data = combined, aes(x = smoke_gas_exposure, y = fev, fill = smoke_gas_exposure)) + 
  geom_boxplot() +
  labs(title = "Boxplot of Forced Expiratory Volume by Smoke and Gas Exposure Category",
       x = "Smoke and Gas Exposure Category",
       y = "FEV (mL/sec)") +
  scale_fill_brewer(palette = "RdPu") +
  geom_text(data = labels_data, aes(x = smoke_gas_exposure, y = mean_fev, label = round(mean_fev, 1)),
            vjust = -0.75, color = "black", size = 3) +
  theme_minimal()

As discussed previously, the median FEV values across the four smoke and gas exposure categories are relatively similar. Of the four groups, the “Neither” group had the highest median FEV value at 2059.1 mL/second, and the “Gas Stove Only” group had the lowest at 2023.5 mL/second. However, further analysis is required to determine if these differences in median values are statistically significant.

Map showing the concentrations of PM2.5 mass in each of the CHS communities

library(leaflet)
leaflet(data = combined) |> 
  addProviderTiles('CartoDB.Positron') |> 
  addCircles(lat = ~lat, lng = ~lon, 
             opacity = 1,
             fillOpacity = 0.7,
             radius = ~pm25_mass * 200,
             color = "pink",
             popup = ~paste(townname, "<br>", "PM2.5 Mass:", pm25_mass, "µg/m³")) |> 
  addLegend(position = "bottomright", 
            colors = "pink", 
            labels = "PM2.5 Mass Concentrations",
            title = "Legend")

This leaflet map showcasing PM2.5 mass concentrations across the 12 communities in this study points to greater concentrations in communities closer to Los Angeles. This makes sense, as urban contributors to air pollution and air quality likely have a heavy role in PM2.5 mass. We see the smallest mass concentration in the northern-most communities in this study (which are also located along or closer to California’s coast, and thus may benefit geographically in overall air quality).

PM2.5 mass and FEV Associations.

summary(combined$pm25_mass)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  5.960   7.615  10.545  14.362  20.988  29.970 
ggplot(data = combined, mapping = aes(x = pm25_mass, y = fev)) + 
  geom_point() +
  geom_smooth(method = "loess", col = "pink", se = FALSE) +
  labs(title = "Scatterplot of PM2.5 Mass vs Forced Expiratory Volume (mL/sec)", 
       x = "PM2.5 Mass", 
       y = "FEV (mL/sec)") + 
  xlim(5.96, 29.97) 
`geom_smooth()` using formula = 'y ~ x'

Based on this scatter plot, there seems to be a negative (although weak) association between PM2.5 mass and forced expiratory volume (FEV).